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We describe and expand upon the scalable randomized benchmarking protocol proposed in Phys. 
Rev. Lett. 106, 180504 (2011) which provides a method for benchmarking quantum gates and 
estimating the gate-dependence of the noise. The protocol allows the noise to have weak time and 
gate-dependence, and we provide a sufficient condition for the applicability of the protocol in terms 
of the average variation of the noise. We discuss how state preparation and measurement errors are 
taken into account and provide a complete proof of the scalability of the protocol. We establish a 
connection in special cases between the error rate provided by this protocol and the error strength 
measured using the diamond norm distance. 



I. INTRODUCTION 

Quantum computers promise an exponential speed-up 
over known classical algorithms for problems such as fac- 
toring integers finding solutions to linear systems 
of equations @ and simulating physical systems [!, 
Quantum error-correction methods have been devised 
for preserving quantum information in the presence of 
noise [If-Q], leading to the theoretical development of a 
fault-tolerant theory of quantum computing [8l4l0| . Such 
a theory promises that quantum computation is possi- 
ble in the presence of errors, provided the error rate is 
below a certain threshold value which depends on the 
particular coding scheme used as well as the error model. 
This potential has motivated much experimental research 
dedicated to building a functioning quantum information 
processor, with various proposals for possible implemen- 
tations (nUll. 



One of the main challenges in building a quantum in- 
formation processor is the non-scalability of completely 
characterizing the noise affecting a quantum system via 
process tomography [l5|, A complete characteriza- 
tion of the noise is useful because it allows for the de- 
termination of good error-correction schemes, and thus 
the possibility of reliable transmission of quantum infor- 
mation. Since complete process tomography is infcasible 
for large systems, there is growing interest in scalable 
methods for partially characterizing the noise affecting a 
quantum system jl7H24| . 

In Ref. (25j we provided a scalable (in the number n 
of qubits comprising the system) and robust method for 
benchmarking the full set of Clifford gates by a single 
parameter using randomization techniques. The concept 
of using randomization methods for benchmarking quan- 
tum gates, commonly called randomized benchmarking 
(RB), was introduced previously in [26|. The sim- 
plicity of these protocols has motivated experimental im- 
plementations in atomic ions for different types of traps 
lH-il], NMR superconducting qubits jHSIMI, and 
atoms in optical lattices (32|. Unfortunately there are 
several drawbacks to the methods of [18l Si- For in- 



stance [18[ assumes the highly idealized situation of the 



noise being independent of the chosen gate, in which case 
the fidelity decay curve averaged over randomly chosen 
unitaries takes the form of an exponential (in the se- 
quence length). The protocol of j26| is limited to the 
single-qubit case and fits the observed fidelity decay av- 
eraged over sequences of single-qubit gates (where each 
gate consists of a random generator of the Clifford group 
composed with a random Pauli operator) to an exponen- 
tial. The decay rate is assumed to provide an estimate of 
the average error probability per Clifford gate. However, 
conditions for when the assumption of an exponential 
decay is valid, specifically in the realistic case of gate- 
dependent and time-dependent noise, were not given. 
Such a set of conditions would be useful because it is easy 
to construct pathological examples where the estimated 
decay rate is not reliable. An unphysical but intuitively 
simple example is when the error is gate-dependent and 
equal to the exact inverse of the target gate. The error 
rate given by the protocol is always equal to zero however 
in actuality there is substantial error on each gate (see 
Sec. I IV Bp . Other important shortcomings of these previ- 
ous RB protocols are that extensions to multi-qubit sys- 
tems are either not scalable or not well understood, and it 
is unclear how to explicitly account for state preparation 
and measurement errors. 

In this paper we give a full analysis of the scalable 
multi-qubit randomized benchmarking protocol for Clif- 
ford gates we proposed in [25j which overcomes the short- 
comings described above. We note that since one "gate" 
in the single-qubit protocol of [26| consists of a random 
Clifford generator as well as a random Pauli operator, 
the cost of implementing a gate in this scheme is 2. In 
the single-qubit case, our RB scheme can be implemented 
by explicitly writing down the 24 elements of the Clifford 
group decomposed into a sequence of the same generators 
that are randomly applied in [26j . The average number of 
generators in such a decomposition is 1.875 which implies 
that even for the single-qubit case our protocol takes no 
more time to implement than that of [26[. Hence, since 
our protocol is scalable and produces an error-estimate 
which overcomes the various shortcomings listed above, 
it is reasonable to apply it over other existing schemes re- 
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gardlcss of the number of qubits comprising the system. 

We provide a detailed proof that our protocol requires 
at most O (n 2 ) quantum gates, O (n 4 ) cost in classical 
pre-processing (to select each gate-sequence), and a num- 
ber of single-shot repetitions that is independent of n. As 
well, we give a thorough explanation of the perturbative 
expansion of the time and gate-dependent errors about 
the average error that leads to the fitting models for the 
observed fidelity decay. Our zcroth order model directly 
shows that for time-independent and gate-independent 
errors the fidelity decay is indeed modeled by an expo- 
nential decay, and the decay rate produces an estimate 
for the average error rate of the noise. 

We derive the first order fitting model which takes into 
account the first-order correction terms in the perturba- 
tive expansion and provide a detailed explanation of the 
conditions for when this is a sufficient model of the fi- 
delity decay curve. The fitting formula shows that gate- 
dependent errors can lead to a deviation from the ex- 
ponential decay (defining a partial test for such effects 
in the noise), which was illustrated via numerical exam- 
ples in [25l ] . State- preparation and measurement errors 
appear as independent fit parameters in the fitting mod- 
els and we discuss when the protocol is robust against 
these errors. In the case of Pauli errors we give some 
novel preliminary results regarding the relationship be- 
tween the benchmarking average error rate and the more 
common diamond norm error measure [HI, [34| used in 
fault-tolerant theory. 

The paper is structured as follows: In section |TT] we 
discuss notation and background material. In section 
IIII Al we discuss the proposed protocol and then in sec- 
tion [TTrB] we present the perturbative expansion and ex- 
pressions for the zero'th and first order fitting models. 
Section ITVl provides a sufficient condition for neglecting 
higher order terms in the model as well as a simple case 
for when the benchmarking scheme fails. We also discuss 
when the protocol is robust against state preparation and 
measurement errors. Section [V] discusses the relationship 
between the error rate given by the benchmarking scheme 
and other measures of error commonly used in quantum 
information. Section IVII provides a detailed proof that 
our protocol is scalable in the number of qubits compris- 
ing the system, and a discussion with concluding remarks 
is contained in section IVUl 



is given by the set of non- negative, trace- 1 linear oper- 
ators on TL. Unless otherwise stated, we will only be 
concerned with quantum operations with the same input 
and output spaces. The set of linear superoperators map- 
ping L (TL) into itself is denoted by T(TL) with the set of 
quantum channels (completely positive, trace-preserving 
linear maps) contained in T(TL) denoted by 5(H). 

There are various methods for quantifying the distance 
between quantum operations, we briefly describe those 
that will be of use to us. Good references for many of 
the topics in this section are [35l - l37| . 

A. Diamond Norm, Average Gate Fidelity and 
Minimum Gate Fidelity 

One method of quantifying the distance between two 
linear superoperators £\, £2 <G T(Ti) is given by the dia- 
mond norm distance, \\£\ — £T 2 1 1 . The diamond norm of 
an arbitrary linear superoperator TZ : L (C™ 1 ) — > L (C n ) 
is defined as, 

\\n\u = sap k& \\n<»x k \\i (2.1) 

where || ||i on superoperators is defined to be the oo-norm 
induced by the trace norm |j || x on L (C m ) and L (C n ). It 
is known that the suprcmum occurs for k = m and so, 

\\n\u = WKVXmh 

= max A:llAlll < 1 \\7l®l m {A)\\ 1 (2.2) 

where A e L (C m ® C m ). Hence for S u £ 2 € Tin), 

||£i-£ 2 ||o= ||(fi-fa)®2^||i. (2.3) 

The diamond norm distance is commonly used in quan- 
tum information due to its operational meaning of being 
related to the optimal probability for distinguishing £\ 
and £1 using a binary outcome POVM and single input 
state (allowing for ancillas) [HJ. 

Another method for quantifying the distance between 
linear superoperators is given by the || norm defined 
for linear superoperator TZ : L (C m ) — » L (C n ) as, 



II. BACKGROUND 

Let us first set some notation. Suppose we have an 
n-qubit quantum system so that the Hilbert space TL 
representing the system has dimension d = 2™. Thus 
TL is isomorphic to C d and both will generically refer 
to the Hilbert space of a d-dimcnsional quantum system 
throughout the presentation. The set of linear operators 
on TL will be denoted by L (TL). The set of pure states 
is represented by complex projective space CP d_1 and 
the set of all mixed states in L(TL), denoted by T>(TL), 



II^Hf^ = m^ A , A=A , mw ^\\TZ(A) Hi (2.4) 

where A G L (C m ). One can sec that || is just || ||i 

(which is also denoted || ||i->i) restricted to Hermitian 
inputs. This norm is less common in quantum informa- 
tion due to its lack of operational meaning, however it 
is a weaker measure of distance than the diamond norm 
since for any linear superoperator TZ : L (C m ) — >• L (C ra ), 
ll^lli^i < This will be of much use to us later 

when we consider neglecting higher order effects in the 
benchmarking scheme. 
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A commonly used state-dependent measure for com- 
paring quantum operations £±, £2 G S(H) is given by the 
channel fidelity, 

Fe u eM = F(£ 1 (p),£ 2 (p)) 

= (tiyj y/^£ 2 (j,)y/£jpf) (2.5) 

where U F" refers to the usual fidelity between quantum 
states [39l] . In the case of a unitary operation hi, quantum 
operation £ , and restricting input states to CP d_1 , the 
channel fidelity is called the gate fidelity. Explicitly, for 

^(0)=tr(^(|0)(0|)f(|0)(0|)), (2.6) 
and defining A = W o £ gives, 

Ts M {4>) = Fk,t{4>) = tr (|0)(0|A(|0)(0|)) . (2.7) 

The channel A can be thought of as representing how 
much £ deviates from U in that if £ = U then A = I. 
The gate fidelity has many nice mathematical proper- 
ties including a simple expression for the average over 
pure states, expressions for the variance in terms of vari- 
ous representations of A and a concentration of measure 
phenomenon for large systems [iol-l42l| . The average gate 
fidelity is obtained by integrating J-g u over CP d_1 using 
the Fubini-Study measure pfs [43| . 

Ts~^ = T^= f tr(|0){0|A(|0){0|))d MFS (0). 

(2.8) 

Taking the minimum oiJ-£ lt s 2 over all mixed states p pro- 
duces a quantity -T 7 ™ 11 ^, commonly called the minimum 
channel fidelity, 

^ a = mivF £ll& &>). 

Note that by concavity of the fidelity, the minimum chan- 
nel fidelity occurs at a pure state [39| . In the case of the 
gate fidelity, the minimum is called the minimum gate 
fidelity. 

In certain cases we will be concerned with how close £\ 
and £ 2 are in terms of the difference between the average 
fidelity of each channel. To this end we define, 

AF(£ 1 ,£ 2 ):=\T £ ^-T £ ^\. (2.9) 

Lastly, we note the following relationships between 
some of the distance measures defined above. First, for 
£1, £2 G S(H) the following inequalities hold, 



AF(£i,£ 2 ) < \\£ 1 -£2\\^ 1 < \\£i-£2\\o- (2.10) 

where we recall the definition of || Hf^ in Eq. (23J The 
second inequality is clear since, 

\\£i - £a||^i < \\£i ~£2\\i< \\£i - £a\U. (2.11) 
Now for the first inequality note that, 

Af(£i,£ 2 ) < max|^ ) |tr((f 1 -£ 2 )(|0)(0|)|0)(0|)| 

< max^lKfx-^HI^DL 

= max.A-.A=Ai ,\\a\U<i II (£1 - £2) (^)IL 

= \\£i-£2\\f^ 00 (2-12) 

where we note that since £\ and £2 are completely pos- 
itive, £\ — £2 is Hermiticity-preserving. Hence since 

| |£i — £2 1| < ll^i-^llf^i the inequalities in Eq. 
(HOI hold. 

Next we show that for any quantum operations £1, 

£2 e S(H), 

■^ a >l-||£i-£»ll- (2-13) 

We have that, 

(2.14) 

By the Fuchs-Van de Graaf inequalities [HJ, 

||f 1 ®J(|^)(^|)-£2®X(|V')^l)l|l > 

1 - F(£! ®I(\iP)(i>\),£ 2 ®X(|V)(VD) (2-15) 

so, 

\\£i-£2\U> 

max WeH vH I 1 - F{£ 1 ®Z(\ip)(i>\),£2 2>I{W)Q>\))\ 
= 1 - min WeH ® n F{£ 1 ®l{\i)){^\),£2®l{\i)){^\)). 

(2.16) 

Now we have, 

Tmn We ^ n F{£ 1 ®l{\4,){^\),£2®im{M)) < 
wm WeH F(£ 1 (\4>)(<f>\)M<l>)(<l>\)) (2.17) 

since 
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mm meHr ^F(£i ®X(\ip)(ip\),£2 <&I(\il>)(il>\)) < mm,^ )6 «F(fi ® I(\<t>)(4>\ <8> |0}<0|), £ 2 ® I(|0}(</>| (8 |0}(0|)) 
= min^ eW (tr^ V^i(l0X0l) I M (£ 2 (|$<0|) ® I0X0I) V^id^l) I M 



tr 



v/^i(i0><0i) (£- 2 (i0X0i)) 10x01 



r 



(2.18) 



So, 



In the case i = 2 the above reduces to a "twirling" \A1 
condition, 



\\£i - f 2 ||o > 1 - min WeK ,F(£i(|0X0|),£ 2 (|0X0|)). 

(2.19) 

Now by concavity, 



7fX = mini^^^d^^D.fad^^D) (2.20) 
and so, 



£ fo-A (c/jpC/,) C/j) = / (UA (U^pU) U*) dU 

j = l JU{d) 

(2.23) 

being satisfied for any quantum channel A and any state 
p [17[ • Since a uniform probability distribution on Clif„ 
forms a 2-design, if Clif„ = {Cj : j G K = {1, |Clif„|}} 
then, 



\£i - £ 



2 ||o- 



(2.21) 



B. The Clifford Group and t-Designs 



The Clifford group on n qubits, denoted Clif„, is de- 
fined as the normalizer of the Pauli group V n and is gen- 
erated by the phase (S), Hadamard (H) and controlled- 
NOT (CNOT) gates. Clif„ plays an important role in 
many areas of quantum information such as universal- 
ity |45|, stabilizer code theory/fault-tolerance [46| and 
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noise estimation 

One extremely useful property of Clif n , especially for 
noise estimation, is that the uniform probability distri- 
bution over Clif„ comprises a unitary 2-design [l7|. A 
unitary t-design is defined as follows, 

Definition 1. Unitary t-Design 

A unitary t-design is a discrete random variable 
{(<?i, Ui), (qx, Uk)}, with each Ui £ U(d), such that 
for every homogeneous complex-valued polynomial p in 
2d 2 indeterminates of degree (s,s) less than or equal to 
(t,t), 



IJ2p(Uj)= f P (U)dU. 

K ~\ JU(d) 



(2.22) 



i=i 



The integral is taken with respect to the Haar measure 
on U(d). Here p(U) is defined to be the evaluation of p at 
the 2d 2 values consisting of the d 2 matrix entries of U as 
well as the d 2 complex conjugates of these matrix entries. 



|Clif„| 

MAX.) - yeny E 

j=l 



(UA (U^pU) U f ) dU. (2.24) 

U{d) 

As shown in 0, HJ, J u{d) (UA (W pU) W) dU pro- 
duces the unique depolarizing channel with the same 
average fidelity as A. Hence if J-"a,z is the average fidelity 
of A, and Ad is given by 



then, 



Mp) =pp + 0--p)j 



(i-p) 



(2.25) 



?K,T = P 



(2.26) 



Thus twirling a quantum operation over the Clifford 
group produces a depolarizing channel and the average 
fidelity is invariant under the twirling operation. 

In Sec. IIIII we will be concerned with compositions 
of both gate-independent and gate-dependent twirls. In 
the gate-independent case, the sequence of twirls of A of 
length k, W(A) k , can be re-written as the k-fold compo- 
sition of Ad with itself. Using the above representation 
of Ad we get, 



W{A)\p) 



P k p+(1-P k ) 



1 



(2.27) 
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Therefore the average fidelity decreases exponentially to 
2 since, 



(l-p k ) 



(2.28) 



We can also write the average fidelity of A in terms of 
its x~ matrix (l5j . The ^-matrix is an important (basis- 
dependent) object in experimental quantum information 
as it is directly related to practical methods in process 
tomography. The %-matrix is obtained by expanding the 
Kraus operators {Ak} of A with respect to a particular 
basis of L ((D d ) , which is most often chosen to be the 

Pauli basis {Pj}fs^ {P = !)■ This gives, 



(2.29) 



and so a complete description for A can be given by es- 
timating the entries of As shown in [l5[ , 



•F a : 



d+ 1 



(2.30) 



which gives, 



Xo,o = V 1 



1 



1 

rf2 



J- A ,l(rf+1)-1 



(2.31) 



Therefore the (0, 0) entry of the x~ matrix for a quantum 
operation with respect to the Pauli basis is invariant un- 
der twirling over a 2-design. Moreover xo,o for A^ ep de- 
creases to -^2 exponentially in k. 



III. RANDOMIZED BENCHMARKING 

In this section we present both the protocol and a full 
derivation of the fitting models for randomized bench- 
marking that were given in (25j . First, we set some 
notation and make various definitions that will be used 
throughout the presentation. 

Denote the elements of Clif„ by Ci and the maxi- 
mum sequence length of applying Clifford gates by M. 
Suppose that the actual implementation of Ci at time j 
(1 < j < M) results in the map £i.j with Sij — Ajj o d 
for some error map A,,j. Hence to each Clifford Ci we 
associate a sequence A^i, Aj.M which represents the 
time-dependent noise operators affecting C,. We define 
the average error operator as follows, 

Definition 2. Average Error Operator 

The average error operator affecting the gates in Clif n 
is given by, 



A - M\Clif n \ ^^ Aj 



(3.1) 



Consider the twirl of the average error operator over 
Clif n . As discussed in Sec. (|IIBp this produces a depolar- 
ized channel Ad, 



A, 



0) = J2 c i ° Aa ™ ° c * 0>) =pp+(i- p)~. 



(3.2) 

Recall from Sec. pi B[) that the average fidelity of A, 
denoted F avc , is invariant under Clifford twirling and so, 



P- 



l-p 



(3.3) 



We now define the average error rate of the set of Clifford 
gates as follows: 

Definition 3. Average Error Rate 

The average error rate, r, of the Clifford gates used in 
a quantum computation is defined to be, 



1 — F — 1 



p- 



l-p\ (d-l)(l-p) 



(3.4) 



It is important to note that r defined above should not 
be confused with the "error rate" , r-p , of a Pauli channel 
V . For Pauli channel V, r-p is defined to be the proba- 
bility that a non-identity Pauli operator is applied to the 
input state. Conditioning on a non-identity Pauli being 
applied, there is still a non-zero probability of the input 
state being unchanged. Subtracting this probability out 
gives our defined parameter r for V which is commonly 
called the "infidelity" of V . One can show that r and r-p 
are related via rp — ■ Following the terminology 

set in [26| we will call r the (average) error-rate of A and 
note that in the case where A is a Pauli channel, r is 
equal to the infidelity of A. 

The parameter r is the figure of merit we want to be 
able to estimate experimentally. One can estimate p di- 
rectly using any of standard process tomography fl5| ]. 
ancilla- assisted / entanglement-assisted process tomogra- 
phy [H or Monte-Carlo methods HI, Hi]. The to- 
mography based schemes suffer from the unrealistic as- 
sumptions of negligible state-preparation and measure- 
ment errors, and clean ancillary states/operations. These 
schemes also require exponential time resources in n mak- 
ing them infcasiblc for even relatively small numbers of 
qubits. The Monte-Carlo methods also have the draw- 
back of assuming negligible state-preparation and mea- 
surement errors. The advantages of these methods are 
that the average fidelity of each gate can be estimated 
and the scheme is efficient in n. 

The experimentally relevant challenge therefore is to 
estimate p while relaxing the assumptions on state prcpa- 
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ration, measurement and ancillary states/processes. Ide- 
ally, such a method should also scale efficiently with the 
number of qubits. As we show below, such an estimate 
can be obtained through benchmarking the performance 
of random circuits. 



A. Protocol 

For a fixed sequence length m < M — 1, the bench- 
marking protocol consists of choosing K m sequences of 
independent and identically distributed uniformly ran- 
dom Clifford elements and calculating the fidelity of the 
average of the K m sequences. One repeats this proce- 
dure for different values of m and fits the fidelity decay 
curve to the models we derive below. More precisely, the 
protocol is as follows, 

Fix an initial state \ip) and perform the following steps: 

Step 1. Fix to < M — 1 and generate K m sequences 
consisting of m + 1 quantum operations. The first m 
operations are chosen uniformly at random from Clif n 
and the m + l'th operation is uniquely determined as 
the inverse gate of the composition of the first m. By 
assumption each operation Ci j is allowed to have some 
error, represented by Aj.j, and each sequence can be 
modelled by the operation, 



where 



OS 1 (A^.oC,,), 



(3.5) 



where i m is the TO-tuple (ii, 



j) (which we sometimes 



also denote by i rn ) and i m +i is uniquely determined by 



Step 2. For each of the K m sequences, measure the 
survival probability Tr[^«Si m (/j^,)]. Here p^ is a quan- 
tuml state that takes into account errors in preparing 
and E^ is the POVM element that takes into ac- 
count measurement errors. In the ideal (noise-free) case 



Pip 



E,i 



IV>><# 



Step 3. Average over the K m random realizations to 
find the averaged sequence fidelity, 



Fseqim,^) = Tt[E^S Km {p^)], 



A„ 



(3.7) 



is the average sequence operation. 

Step 4- Repeat Steps 1 through 3 for different values 
of m and fit the results for the averaged sequence fidelity 
(defined in Eq. ()3.6|) ) to the model 



Jf\m, |V» = A 1 p TO + J B 1 +Ci(m-l)(<Z-p 2 y n ~ 2 (3.8) 
derived below. The coefficients A\, B\, and C\ absorb 
the state preparation and measurement errors as well as 
the error on the final gate. The difference q — p 2 is a 
measure of the degree of gate-dependence in the errors, 
and p determines the average error-rate r according to 
the relation given by Eq. (|3 .4[) . In the case of gate- 
independent and time-independent errors the results will 
fit the simpler model 



Bn 



(3.9) 



also derived below, where A$ and Bq absorb state 
preparation and measurement errors as well as the er- 
ror on the final gate. 

We note that for each to, in the limit of K m — > oo, 
F se q(jn,'4 ) ) converges to the exact (uniform) average, 
J- g (m,i{j), over all sequences, 



T g {m,^) = Tr[E^,S m (P4,)} 

(3.10) 

where we define the exact average of the sequences to be, 
1 



Sm — ,m ^ Aj m+1)m+1 oCi m+1 0...oAj 1) ioCj 1 . 

n (ii,— ,»m) 

(3.11) 

Hence the fitting functions by which we model the be- 
havior of F scq (m,^) are derived in terms of ^F g (m,ip) 
(see Sec. IIIIBj) . Note that since T g (m,tp) is the uniform 
average over all sequences we can sum over each index 
(3.6) independently, 



■F g (m, i/j) = Inrf ,m tr (Ai m+1 , m +i o Ci m+1 o A im . m o C lm o ... o A ilj i o d^p^E^) 



(3.12) 



In order to prepare for the next section where we derive 
the above fitting models, we write T g (m, ip) in a more in- 



tuitive form. We first re-write A,- 



Ci 



l+l°Ci. 



>A, 



1 A.^,1 oCjj by inductively defining new uniformly 
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random gates from the Clifford group in the following 
manner: 

1. Define T>i 1 = Ci 1 . 

2. Define T> i2 uniquely by the equation Cj 2 = T> i2 oZ?j , 
ie. V l2 = C i2 o C h = 2 s=1 C l3 . 

3. In general, for j € {2,...,m}, if Ci 1 ,...,Ci j and 
T>i l ,...,T>i j have been chosen, define uniquely by 
the equation C ij+1 = T> lj+1 o X> ij . t , ie. 

V lj+1 = C ij+1 o ... o C n = OH\C is . (3.13) 

Note that if j ^ k, Ci and d k are independent and 
so since the Clifford elements form a group, for each 
j = 2, ...,m + 1, T>i j is independent of T>i j l . As well, 
summing over each ij index runs over every Clifford ele- 
ment once and only once in V> i] . 

We have created a new sequence (V il7 ...,T> im ) from 
(Ci ± , Ci m ) uniquely so that 

^im = A» TO+I ,m+1 ° Cj m+1 o A im)TO o C im o ... o Ai lt i O Cjj 
= A lm+ljTO+1 o E> im+1 o P im f o A im , m o V lm o ... 
oP^f o oD tl . (3.14) 

Since Cj m+1 = o ... o C\ m and X>i m+1 = d m+1 o ... o d i: 

V lm+1 = 1. (3.15) 

Hence the m+l'th gate is decoupled from the rest of the 
sequence and we have 



= Ai m+1>m+ i o C,; m+1 o Ai m>m o C im o ... o A iu i o C n 

= A im+1>m+ i o ViJ o A im>m o V im o ... 

oViJ o A iul oD !r (3.16) 



B. Perturbative Expansion and the Fitting Models 

We would like to develop fitting models for J r g (m,ip) 
where the most general noise model allows for the noise 
to depend upon both the set of gates in Clif„ and time. 
We can estimate the behavior of J- g (m, ip) by considering 
a perturbative expansion of each Aij about the average 
A. We quantify the difference between A,^ and A by 
defining for all i, j, 



SAij = Ai,j - A. (3.17) 
Our approach will be valid provided SAij is a small per- 
turbation from A in a sense to be made precise later. 
Note that each <5A,j is a Hcrmiticity-prescrving, trace- 
annihilating linear supcropcrator. Under the above con- 
ditions this approach will allow for fitting the experimen- 
tal fidelity decay sequence to a model with fit parameters 
that determine not only the average error per gate but 
also the separate contribution from the combined effects 
of state preparation and measurement errors. In the limit 
of multiple qubits and very precise control weaker forms 
of twirling may permit even more detailed modeling of 
the noise. 

Using the change of variables T>i. = Q J s=1 Ci s described 
above and expanding to first order we get, 



= Ai m+ljTO+ i o C lm+1 o ... o Ai j j o C h o ... o Ai lt i o C ix 

= A im+1)TO+ i o V im } o A im>m o V im o ... oT>iJ o A 4li i o V n 

= A o ViJ o A o V im o ... o5 fl t o A o V n + 5A im+ltm+1 o (v.J o A o X> im ) o ... o (v^ o A o P n ) 
+ ... + A o (v ir J o A o V in \ o ... o (v^ o 5 Aij j oV t ^o ... o (P n t oAoD.J 

+... + A o (r>ij o A o 2? <ra ) o ... o (vtj o 5Ai^ oVi^+ 0(5Af. d ). (3.18) 



We define 



S { 2 } := AoViJ o AoV im o ... oViJ o AoV n , (3.19) 
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<JA im+1 , m+ i o (r>ij o AoP tm ) 0...0 (v^ oAoV H 
+... + A o (ViJ o A o V, ) o ... o ( P, o iA tjJ o ) 
+ •■■ + A o (r>ij o A o D im ) o ... o (p n f o (JAj^i o P n 

I 



(3.20) 



and so on for higher order perturbation terms. As well, 
recalling the definition of S m in Eq. (|3.11[) . we dehnc for 
each order k. 



IClif 



1 \^ <j(fc) 



and 



(3.21) 



(m,IV>» = 



where 



tr(S,W( W )i^ 
tr(A(^)^)p' 



-tr(A( -)£«,)(! -p m ) 



Bn 



(3.26) 



A k) M) :=tr 



so that, 



and 



m+l 



5 - V 5 (fc) 



(3.22) 



(3.23) 



and 



An := Tr 



Bn := Tr 



(3.27) 



(3.28) 



Hence, assuming the simplest (ideal) scenario where 
the noise operator at each step is independent of the 
applied gate (and is also time- invariant), !Fg(m, tp) = 

Tg°\m, |?/>}) decays exponentially in p. 



F g (m,ip)=A m+1 \m,\i>)) = ti 



' m+l 



j2 s$J W)^ 

(3.24) 



i. Zeroth Order Model 

First, we look at the zeroth order fitting model 
Fg°\m, and note that J 7 ^ (m, |^>}) is exact in the 
case that the noise is independent of both the gate cho- 
sen and time, ie. A 



A. By independence of the 
T>i . and the fact that averaging over the ensemble of re- 
alizations produces independent twirls which depolarize 
m factors of A (see Sec. (|IIB|) ) we get, 



S%> = A o A d o ... o A d = A o (O™ 1 A d ) 



(3.25) 



Thus, 



2. First Order Model 



To find J-"g (m, \ip}) we note that in the definition of S i 

given by Eq. (|3~2T))) there are (™^ 1 )= m + 1 first-order 
perturbation terms which contain the gate dependence. 
First, we consider the m — 1 terms with j G {2, ...,m}. 
For each such j, averaging over the {i\...i m } gives a term 
of the form, 



|Clif„ 



(A/ O (JAi^j o P, ) o ( /I, o A o P, ) 



Aoft 



(3.29) 



For these m — 1 terms the main trick is to realize that 
we can re-expand Pj. = oD ; in order to depolarize 

the unitarily rotated perturbation Cj.A^jC^ with the 
twirling operation tq^-t ^ij-i ' ^fy-i because the 
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sums arc independent. More precisely, the above can be written as, 
I 



( A . \ V, ,) 0...0 (D^toAo^] 

i 3 _2,...,ii 

= A o A™~ j o (( Qj o A) d - Ag) o Af 2 , (3.30) 
I 

where := Tcno YjiQ ° ° &i arL d the subscript where 
d represents the depolarization of the operator within 
brackets. Using the fact that depolarizing channels com- 
mute we get, 



Ao A 



m-j 



ICliL 



A o A™ - - 7 



o((Q 3 oA) d -A 2 )oAr 2 
Ao^.oA^-A^oA™- 2 . 



(3.31) 



For the term with j = 1, averaging over ii, i m gives 
a term of the form, 



i-E(cloA M o Cl 



|Clif„| 



(3.33) 



AoAy- 1 o_l- r 5^2? il to*A ill io2? <1 = AoA™- 1 o(Q 1 -A d ), 

(3.32) Lastly for the term with j = m + 1, averaging gives, 



IClif 



^-p ^ SA im+um+1 o (ViJoAoViS) 0...0 (Ci^oAoDj 



|Clif„ 



im-1 ( IClif„l S 



rtA 



AoD, 



(V 



Aoft 



(3.34) 



r 



Since Clif„ is a group, if £i, is fixed, averaging 
over the i m index runs through every Clifford element 
with equal frequency in the T> im random variable. Since 
Ai m+1 , m +i is just the error associated with the gate T>\ , 

fclkr ^Wi^+i ^'™' ° A ° A™) is independent 
of the ii, i m -i indices. Hence we can define 



where Ai^ m+ i denotes the error that arises when the Clif- 
ford operation C\ is applied at final time-step m + 1. 
Again, using the group property of Clif n we have, 



Tlm+i = | C1 . f | Aj, m+ i o (C, oAoC,'] . (3.36) 



ftm+i := — j| , A im+1 , m+ i o o A o 

= y~^ A;/ jm+ i o fct o A o Ci) (3.35) This decoupling of lZ m +i allows us to write, 
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|Clif, 



AoDj 



(ft m+ i - A o A d ) o A™ 
I 



(3.37) 



Hence combining Eq.'s (f3~25]) . (|3~3l"]) . (pH2j) and ([337]) gives, 
I 



S$ + S%> = A o A? + (K m+1 A o A d ) o A™- 1 + E A ° ((Qj A ) d - A ^) A ™~ 2 + A A ^ ° (Gi - A d ) 

j=2 



ftm+i o A™- 1 + E ( A ° (Qj ° A) d o A™- 2 ) + A o A™- 1 o Qi - m (A o A™) 

i=2 



(3.38) 



To calculate J 7 , 



C 1 )/ 



(m, |V)) := tr [(^ 0) + S&) (p4>) E ^ 



we have, 

tr (K m+1 o AJ- 1 ^)^) - Gi, m+ ip m_1 + H lim+1 , 



(3.39) 



tr(AoA^)^) = A)p m + 5 , (3.42) 



tr (A o (Qj o A) d o A™- 2 (^)^) = Ao^p m - 2 + B 



tr (A o A™" 1 o GiW)^) = ^i,iP m_1 + So, (3.41) 



(3.40) whcrc GijfB+i . = tr(7^+i(^-f)^), i?i, m+ i := 
tr(^„ l+ i(i)^), Ai,i := tr (A (GiQcy) - 3)^0), 4) 
and Bo are as given in Eq.s (|3.27j) and (|3.28j) . and qj 
is the depolarization parameter for (Qj o A) d . Thus, 



7f\m, = G hm+ xp m - X + H hm+1 + J2(A qjP m - 2 + B ) + A^p™" 1 +B -m (A Q p m + B ) 

= P m - 1 {G 1>m+ i + -A p) + (m-l)A oP m - 2 [ ^ J=2 f - p 2 ) + H hm+1 . 

\ m — 1 / 



(3.43) 



Finally, we can also re-write Eq. (|3.43j) as, 



Jf\m, = A 1 (m)p m +B 1 (m)+C 1 (m-l)(q(m) -p 2 )p" 

(3.44) 



where, 



Ai(m) = Tr 



-Tr 



B\(m) = Tr 
Ci = Tr 



FT? I P ^ 1 

^Wv-m+l ; 

p pa 



pd 



J=2 



(3.45) 
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and qj is the depolarizing parameter defined by 
{Q j oA) d {p)=q j p+(l-q j )^. 



(3.46) 



We write the first order model in the form of Eq. (|3.44j) 
because of its similarity to that of the zeroth order model 
given by Eq. (f3T26|) . The difference between Eq.'s (pT4"4]) 
and (|3.26|) is the Ci(m — l)(q(m) — p 2 )p m ~ 2 term con- 
tained in Eq. (|3.44j) . which can be thought of as a mea- 
sure of the gate-dependence of the noise. 

Again, we see that the edge effects, state-preparation 
and measurement errors are embedded in the three co- 
efficients Ai(m), Bi(m), and C\. Note that the m de- 
pendence in q(m) and the Ai(m), and Bi(m) coefficients 
due to the last gate disappears if the errors don't change 
as a function of time. 



IV. NEGLECTING HIGHER ORDERS 

A. Bounding Higher Order Perturbation Terms 

We would like to give conditions for when one is jus- 
tified in stopping the expansion at some order k. The 
main idea, as expressed in Eq. (|4.ip below, is to bound 

the "size" of the terms in S m and we use the "1 — > 1" 
norm on linear superoperators maximized over Hermitian 
inputs, denoted || HfL^i, to make this precise (see Sec. HI)) . 
Note that || ||f^ 1 has the following useful properties: 



• submultiplicativity for Hcrmiticity-preserving su- 
peroperators, 

• unitary invariance, 

• H^HiL^i < 1 for any quantum operation £ . 



Later we will discuss the motivation for using j| ||fL>.i as 
opposed to more familiar norms used in quantum infor- 
mation theory such as the diamond norm || || . 
From Sec. Ill Al we have that, 



tr 



fc+i 



- tr 



£S&> ] (p^ 



tr [S^ +1 \p^ 

<]|^ +1) llf-*i 
(fc+i) 



(4.1) 



and so bounding S m provides a bound for how much 
the k and k + 1-ordcr fidelities will differ. We first look 
at the case of stopping at first order, ie. k = 1. There 
second order perturbation terms in 



are 



(m+l\ 



(m+ljm 
2 



Eq. (|3T8p . Let us look at at a term with perturbations 
at ji and ji where without loss of generality we assume 
h > ii- Using the properties listed above, along with 
the triangle inequality, we have, 



- — — - VAoD,! oAoD, o ... o V] o <5A, . oD, o ... o v] o5A t . oD, o...o2)j o A o D u 

Plif L ' l m lm hi J 2 hi *j! hi hi ll '1 

< jc^T ? l|A|lf - IK A ° Vi ~ L - R ° sh * ' 



(l|A||f_x 



1 1 



H 

1^1 
H 1 



YD] oSAi. oD t . 

iji hi hi 







H 


D] oA 


oD tl 








1^1 




D f c 


A o D n 


1^1 


n 





i-n |CHf n | 



E Hi o6A ^ oV ^ 



< — - — y^llp 1, o5A„. oDi. 
- ClifJ ^\\ % i2 *m 



H 1 



HI |CUf„ 

: Ihlji 

I 



H 



H 

1^1 



(4.2) 



where we define the time-dependent variation in the Summing over all ji, j% with ji > ji gives, 
noise, 



^IA,:,--A||f_^ 



IClifJ^"^ 



(4.3) 
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5 (- 



H 



11 



1 V Q< 2 

iciif„r^ & 

: — !-=y V AoDj oSAoV, o... o2?f o,5A 4 . oX»j. o... oX>] oM, oD, o...oD| oAoD„ 

CJif / ■ > / ■< «77i lm *j 2 4 J2 *J2 t-fj hi hi ll '1 

™ C 22>ii 



^ E 

J2 >h 



|Clif„ 



frE AoI) L 0(SAo Pi «> ° - ° V L ° 5Ai « ° ^2 ° - V hi ° ^ A ^i ° °- oI) l oAo P n 



< E "fh'Yh 
32>h 
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1->1 
if 



1->1 
(4.4) 



I 

In terms of the fidelity we thus have from Eq.'s (|4.1j) and erator norm || || that satisfies the properties listed above, 
(|4.4[) , the following inequality holds, 



jf>(m,lV>»- ^\mM))\ < E ^7,1- (4.5) \T^\m^) - A k \mM < ( m + %« (4.11) 

32>3l y J \ k J 



Note that if the noise is time-independent then we have, where 



Et 2 



(m + l)m n 

- — tt^t 



(4.6) 



32>3l 



which gives, 



jf> (m,\i>})-jf>(m,\i/>)) 



(m + l)m 2 
< o 7 ■ 



(4.7) 



It is straightforward to show that bounds on higher 
order terms go as 



c,(fc) 

*->7Y, 



< 



1— S-l 



E Tfc-Tji 



so that the difference between the k and fc + 1-order fi- 
delities is bounded by, 



7 := 



ICliL 



Eh a *- a i 



(4.12) 



and for simplicity we have assumed time-independent 
noise. 

The above equations show that in order to give the 
tightest bound on the fidelity difference we would like to 
find the norm || • || that provides the smallest value of 
7. The diamond norm || • ||<> is a candidate however by 
Eq. (|2.11[) || || is much weaker than || ■ || . Therefore 
7 associated with || ||f^.i will be much smaller than 7 
associated with || • || , providing a tighter bound on the 



(4.8) fidelity difference. 



B. Case Where Benchmarking Fails 



jf +1 W)~^ fc W) < E 7i*-7ii- (4-9) 



jk>—>h 

Again if the noise is time-independent, 



We now discuss our motivation for using || Hf^i as 
opposed to more familiar norms for distinguishing super- 
operators, such as the diamond norm. For any superop- 



Therc is a simple (and highly un-physical) case for 
when benchmarking fails. Suppose the noise is time- 
independent and for each i, A; = Cj. Then F g (m,ip) = 1 
for every m even though there is substantial error on each 
Ci and so benchmarking clearly fails. The key point to 
note here is that the noise is highly dependent on the 
gate chosen and so we expect that the sufficient condi- 
tion derived above for ignoring higher order terms will 
not be satisfied (ie. 7 in this example will be far from 
0). To see that this is the case, note that since Clif n is 
a unitary 2-design it is also a unitary 1-design. Hence 
since Clif„ is f-closed, 
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C. State Preparation and Measurement Errors 



|Clif„| 



|Clif„| f-f 

l — l 



A.,; = 



|Clif„ 



|Clif, 



|Clif„| 

Jo E 4 

i=l 
|Clif„| 

sEft 



(4.13) 



where f2 is the totally depolarizing channel mapping ev- 
ery input state to the maximally mixed state h. There- 
fore, 



lAi-AHf^ = 11^-011^. 



(4.14) 



Now ||Aj — A||^_ > . 1 is achieved at a pure state and for 
any pure state 



(A, - A)(^){^l) = Cj^X^IC, - -. (4.15) 



Hence if \(f>) is a pure state at which ||Aj — A||^i is 
achieved, 



IIAi-AHf^ = 



2(d-l) 



(4.16) 



Therefore in this case, 



1 



|Clif n 
2(d- 1) 



A 



> 1 



(4.17) 



In this section we analyze the effect of state prepara- 
tion and measurement errors on the benchmarking pro- 
tocol. The main result is that these errors can be ignored 
in situations of practical relevance. For simplicity of the 
discussion let us assume the gate-dependence of the noise 
is weak enough so that the zcroth order expression given 
in Eq. (|3.26[) is a valid model for the fidelity decay curve. 
One can obtain an estimate for p as long as the fidelity 
curve is not constant. As state-preparation and mea- 
surement errors are accounted for in Aq and Bq we can 
obtain an estimate for p regardless of the form of the 
state-preparation and measurement errors whenever the 
curve is not constant. Thus the protocol is robust against 
any state preparation or measurement errors unless these 
errors create a constant fidelity curve. It is straightfor- 
ward to characterize exactly when the fidelity curve is 
constant. 

From Eq. (|3.26p an exponential decay occurs if and 
only if Aq is non-zero and p lies in (0, 1). Hence no decay 
occurs if and only if one of p = 0, p = 1 or Aq = occurs. 
We look at each case separately. 

p = 0: This occurs if and only if A is the totally depo- 
larizing channel and in this case the fidelity is constant 



at B ( 



< 



Since we have assumed small gate- 



dependence, this case is only possible if most of the errors 
are approximately centred around the totally depolariz- 
ing channel with little variation. This situation is of little 
practical relevance since the gate operations being char- 
acterized are usually reasonably precise. 

p = 1: This case corresponds to A being the identity 
channel which means all gates arc perfect. Again, in 
practice this situation is unlikely as the implementation 
of any gate will have some associated error. Note that 
in this case the fidelity is equal to A + Bq which is just 
ti(A(p^)E^)) = tr(p^E^). Hence the constant decay 
curve is a measure of the overlap between the imperfect 
input state and imperfect POVM element. 

Aq = 0: The case Aq = occurs if and only if 



(4.18) 



tr(E*A(/ty)) = tr E^A - 



and so our sufficient condition is not satisfied as expected. 

It is important to note that one can devise tests for 
when such a pathological case is occurring. One simple 
test is given as follows: If the input state is \tp) then 
choose Clifford elements Ct that map \tp) to an orthogo- 
nal state in the measurement basis containing \tp). For 
each i, apply C,; to |?/>) and perform the measurement. 
For small noise strength the output of the measurement 
should almost never be ip, however if the noise is some- 
thing close to the inverse of the gate the measurement 
result will be with high probability. 



Thus A(p.0)) and A (Jjj have the same probability of 

producing the output "?/;" from the measurement. Since 
gates are reasonably precise in practice, this situation 
occurs when at least one of state preparation or mea- 
surement has substantial error. Note that the fidelity 
will be equal to Bq in this case and so can take any value 
in [0,1]. 

From the above three cases, the only one that depends 
upon state preparation or measurement errors is the case 
Aq = 0. Since this case occurs when at least one of state 
preparation or measurement errors has substantial error 
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it is unlikely to arise in practice. This discussion shows 
that a constant fidelity decay curve can only occur in 
extreme cases and so it is safe to assume the protocol 
is independent of state preparation and measurement er- 
rors. 



V. AVERAGE ERROR RATE AND THE 
DIAMOND NORM 

In terms of connections between the average error rate 
r and relevant fault-tolerant measures of error, it is natu- 
ral to ask how the error rate r between A and I is related 
to the diamond norm between A and X. In general an ex- 
plicit relationship will be impossible to obtain, however 
we show that in certain cases that are relevant in various 
fault-tolerant noise models we can obtain such a relation- 
ship. First we give a new proof of a previously established 
result [38j j for calculating the diamond norm distance be- 
tween generalized Pauli channels. The proof we present 
here illustrates how one can apply a semidefinite program 
to calculate the diamond norm distance between quan- 
tum channels [49[ . Ideally, this proof technique could be 
used to either explicitly calculate or place bounds on the 
diamond norm distance between more general classes of 
quantum channels. This could allow for obtaining further 
relationships between r and the diamond norm distance 
which hold in more general cases. 



A. Calculating the Diamond Norm Distance 
Between Generalized Pauli Channels 

Suppose £\ and £2 are Pauli channels, or more gen- 
erally any channels with Kraus operators given by an 
orthogonal (normalized to d) basis of unitary operators 
{Pi}f- (which we call generalized Pauli channels), 

d 2 -i 

£i0>) = E * p *P p i ( 5J ) 

i=0 
d 2 -l 

Hp) = E nPipP}. (5.2) 

i=0 

Define the vector v of length d 2 by 

Vi = Qi- n (5.3) 
for all i 6 {0, ...,d 2 - 1}. Then, 

d 2 -i 

lift -£ 2 |U = Nli = E N- ( 5 - 4 ) 

i=0 



To prove Eq. (|5.4|) using the semidefinite program 
in [49| first note that $ = £\ — £2 has action, 

d 2 -i 

Hp) = E (* - n)PiP p l ( 5 - 5 ) 

t=0 

The semidefinite program has the following primal and 
dual problems: 

Primal problem: Maximize (</($), W) subject to W < 
l d ® p, W G Pos (L (C d ® C d )), p e D (L (C d )), 

Dual problem: Minimize ||tr 1 (Z)|| 00 subject to Z > 
J($), ZgPos(L (C d ®C d )), 

where </($) is the Choi matrix [Ho| of $. If a and f5 
are the solutions to the primal and dual problems then 
the case that a — f3 is called strong duality. It is shown 
in [49| that the above semidefinite program always has 
the property of strong duality and the solution to the 
program is a = \\\£\ — ^Ho- Note also that it is always 
the case that a < (3. 
By definition, 

J($) =d$®I(|Vo)<V'o|) 

d 2 -i 

= d E (ft - ri)Pi ® llVoXV^ ® 1. (5.6) 

i=0 

,2 -. 

Noting that ■= Pi ® ll^o)}^^ f° rrns an orthonor- 

mal basis of maximally entangled states for C d ® C d , 
which we call the generalized Bell basis (GBB), we have 
that J($) is diagonal when written in GBB with diago- 
nal elements (eigenvalues) d(qt — r»). Let n + denote the 
projector onto the eigenspace with non-negative eigen- 
values and n_ denote the projector onto the eigenspace 
with negative eigenvalues. 

For the primal problem let W = —j- and P=h- Then 

(J($), W)= E 9* - r* = 5 X>* ~ r *l = \\\v\\i- 

k-qk~r k >0 k 

(5.7) 

Thus a > 

For the dual problem take Z = dH + J(<f>)II + which is 
J ust Y,k:q k -r h >o(lk ~ r kMk)(ipk\ and note Z > J($). 
Moreover, tri(Z) = d f Efc : , fe -r fe >o Ik - r k ) 5 and so 

||tri(Z)||oo = j Vk-r k \ = ||MU- (5-8) 

\k:q k -r k >0 J 

Thus a < h\\v\\i which implies a = ^\\v\\i and \\£± — 
6*2 ||o = \\v\\i as desired. 
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As a simple corollary to Eq. (|5.4[) note that if £\ and 
£2 are depolarizing channels with fidelity parameters p\ 
and P2 respectively then, 



To sec this note that 



2\ Pl - p 2 \(d 2 - 1) 
d 2 



<7o = 



d d 
= (d 2 - l) Pl + 1 
d 2 



and similarly, 



ro 



(d 2 - l)pa + 1 



d 2 



Thus for every 1 < i < d 2 — 1, 



(5.9) 



(d+l)F £l)I -l _ (d+l)(p 1 + i^)-l 



(5.10) 



(5.11) 



We know that qo is related to the average fidelity of £1, 

Fsi,x, by 



^,1 



d+l 



(5.16) 



and so, 



\£i-n 



2(d+l)(l-F £uX ) 



(5.17) 



Therefore in the case of randomized benchmarking 
(where we define the error rate r = 1 — i*A,x) if A is 
a generalized Pauli channel, r and ||A — X|| are related 
by 



IA-IL = 2 



(d+l) 



(5.18) 



VI. SCALABILITY OF THE PROTOCOL 



and 



So, 



\£i - £: 



20 



1 ~ go = 1 - Pi 
d 2 -l d 2 



1 - r _ 1 -_p 2 
d 2 - 1 ~ d 2 ' 



d 2 

+ (d 2 1) 



d 2 



1 - Pi ( 1 - P2 



d 2 



l (d 2 -l)\ Pl -p 2 \ 
d? 



(5.12) 



(5.13) 



Iklli 

d 2 -l 

ko - r \ + ^2 \qt - n\ 

i=l 

(d 2 -l) Pl + l /(d 2 -l)p 2 + l 



(5.14) 



B. Relating the Diamond Norm and Error Rate in 
Benchmarking 



Now suppose that £2 = % in Eq. (|5.4j) . Then, ro = 1 
and for every 1 < i < d 2 — 1, rj = 0. Hence in this case, 



|£i -X|| = Hli = ko - 1| + 1 - go = 2(1 - g ). 



(5.15) 



In this section we fill in the details of the scalab ility 
proof of our RB protocol that was briefly outlined in [25| ■ 
First, we note that the size of the Clifford group scales 

as 2°(™ ) and so the number of sequences of length m 
scales as Hence if full averaging over the Clif- 

ford group is required for each sequence length, our pro- 
tocol does not scale well in either of n or m. As mentioned 
in [25| , there are three obstacles to overcome in order for 
the above protocol to be scalable: 

1. Sequence length: Since the number of sequences of 
length m scales as 2 m °(™ ), averaging over all sequences 
for each m is clearly inefficient. 

2. Uniform sampling: Since the size of the Clifford group 
scales as , sampling directly from a list of all Clif- 
ford elements becomes impossible for large n (writing 
down every element is inefficient in n). 

3. Implementing Clifford operations: In practice, one can 
only implement a generating set for the Clifford group. 
Hence even if random sampling can be accomplished 
there must be a scalable method for implementing each 
Clifford using only this generating set. 

We now describe how to overcome each of the above ob- 
stacles. 



Solution to 1 : From Eq. (|3.12[) , .F 9 (m, i/j) is the uniform 
average of the random variable 



'■(m, |V)) := tr (s^p^E^j 



= tr A 



(A ira+1 , m+ i °C im+1 o ... o A na od^p^E^j (6.1) 
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over |Clif„|™ 1 sequences (i\,...,i m ). The benchmarking 
protocol requires choosing a sequence at random, eval- 
uating the above fidelity, repeating for many sequences, 
and taking the average of the results. 

Let Sk[m, \ip)) = — £ a be the nor- 
malized k-fold sum of the random variable J r g m (m, \ip)) 
and note that M[Sk(m, \ip))] = J- g (m,il>). A probablistic 
bound on \Sk(m, \il>)) — J- g (m, is given by Hdcffding's 
inequality, 

P (|5 fc (m, - 7 9 (m^)| > e) < 2e^& 

= 2e0^ (6.2) 

where [a, 6] is the range of T l g m (to, | ip))- Since 

T % ™ (to, \ip}) is a fidelity it must lie in [0,1] (in reality 
it will lie in a much smaller interval, for now we continue 
to assume it lies in [a, b] C [0, 1]). Suppose we want 



TP(\S k (m,\i>))-F g (m,ip)\ > e) <6 (6.3) 

where e represents the accuracy of the estimate and 1 — S 
represents the desired confidence level. We can find how 
many trials one needs to perform to obtain this accuracy 

-2ke 2 

by setting S = 2e [ ~ b -^ i and solving for k, 

In (§) (6 - a) 2 , 
fc= 2 f2 ■ (6-4) 

Note that k is explicitly independent of m and n which 
provides a solution to 1. 

It is instructive to obtain an estimate of the size of 
k for realistic parameter values of S and e. Since 1 — 5 
represents our desired confidence level we set S = 0.05. 
Fault-tolerance provides a wide range for the error tol- 
erance of a physical (0-level) gate in the fault-tolerant 
construction. The value of the error tolerance depends 
on both the coding scheme as well as the noise model and 
typical values lie somewhere between 10 -6 and 10~ 2 . Let 
us assume that the physical gates have errors on the or- 
der of 10~ 4 . Intuitively, since the fidelity curve decays 
in sequence length it is reasonable to assume that e can 
be relaxed as to grows large. Similarly, b — a can be as- 
sumed to be relatively small for small values of to but 
will converge to 1 — ^ as to grows large. As a result both 
b — a and e have an implicit dependence on to and this 
implicit dependence is advantageous when choosing e for 
large values of to. Let us assume to = 100 and a fidelity 
decay curve that is well- approximated by an exponential. 
Then we expect fidelity values on the order of 0.99 at this 
value of to and so we take e = 10 -3 , b — a = 0.2. With 
these values for e, S and b — a we get, 



, = ln(p^)(0-2) 2 
2(10" 3 ) 2 

- 7 x 10 4 . (6.5) 

While this number is large it is independent of n and thus 
compares favourably with quantum process tomography 
which scales as 16™. As a direct comparison, performing 
process tomography on a 4 qubit system already requires 
65536 measurements. 

Solution to 2 : 

For the second problem we present a method to scal- 
ably sample uniformly from the full Clifford group that 
utilizes the symplectic representation of the Clifford 
group (see Ref's [HI, [Hf). Since the Clifford group is 
the normalizer of the Pauli group, every Clifford clement 
is completely determined by its action under conjugation 
on the Pauli group. In particular, since the Pauli group 
is generated by the set of all A,; and Zi (the label i refers 
to A or Z being in the i'th position with identity op- 
erators elsewhere), an element of the Clifford group is 
completely determined by its action on this set. In the 
symplectic representation this corresponds to each Clif- 
ford element Q being associated uniquely to a 2n by 2n 
binary symplectic matrix C and length In binary vector 
h which records negative signs in the images of A,; and 
Zj. The only constraints on Q are that commutation 
relations and Hermiticity of the generating set must be 
preserved under Q. Hence we can construct a random 
Clifford clement Q by inductively constructing a random 
symplectic matrix C and vector h. 

Since h corresponds to keeping track of negative signs, 
the binary entries of h can be chosen uniformly at ran- 
dom. C is inductively constructed column by column 
where the first n columns correspond to the images of 
Ai through A n , and the last n columns correspond to 
the images of Z\ through Z n (all of which are written in 
binary notation as in [52j). Preservation of commutation 
relations is phrased through the symplectic inner prod- 
uct and so at each step one chooses the new column by 
finding a random solution to a system of linear equations 
which represents the inner product conditions. Since ran- 
domly choosing 2n elements of the Pauli group that sat- 
isfy the required commutation relations is equivalent to 
inductively choosing random solutions to 2n sets of lin- 
ear equations (which requires O (n, 3 ) operations), we can 
produce a random Clifford element in O (n 4 ) (classical) 
operations. 

Solution to 3 : Any Clifford element can be decomposed 
into a sequence of 0(n 2 ) one and two-qubit genera- 
tors in O (n 2 ) time [52( (alternatively, there are slower 
methods which produce a "canonical" decomposition into 
O (n 2 /logn) generators JH3|). We describe this method 
which again utilizes the symplectic representation of the 
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Clifford group. As mentioned above, every Clifford el- 
ement Q is represented up to phase by a binary, sym- 
plectic matrix C and a binary vector h. The main goal 
is to decompose C into generators as the negative signs 
represented by h can be accounted for via multiplication 
by single-qubit Pauli operators. The main theorem used 
in the decomposition of Clifford elements is theorem 4 
of p32j which states that if C is a binary symplectic ma- 
trix then C can be decomposed as a product of five binary 
symplectic matrices, which we denote by 7\ through T 5 . 

These symplectic matrices can be decomposed into 
symplectic matrices representing 1 and 2-qubit Clif- 
ford operations that correspond to Hadamard's, single 
qubit \ -rotations about az, two-qubit ^-rotations about 
az ® <Jz i two-qubit permutation operations and CNOT 
operations. The overall discussion can be condensed into 
the following main result: 

Main Result: Every Clifford operation Q can be realized 
by a sequence of one and two-qubit Clifford operations 
which consists of the following six rounds of operations: 

1. An initial round of single-qubit Pauli operators, 

2. Applying a sequence of CNOT and two-qubit per- 
mutation operations, 

3. Applying a sequence of f rotations about az ® <Jz 
followed by a sequence of -| rotations about az, 

4. Applying Hadamard operations, 

5. Applying a sequence of ^ rotations about az ® az 
followed by a sequence of ^ rotations about az, 

6. Applying a final round of CNOT and two-qubit 
permutation operations. 

Note that the operations within each of the rounds 3, 4 
and 5 all commute and can be performed in any order. 

The time-complexity in decomposing a symplectic ma- 
trix into the sequence of one and two-qubit Clifford oper- 
ations given above is 0(n 3 ) since one needs to solve linear 
systems of equations to obtain T\ through T5. In many 
cases one would like to have a decomposition of a Clifford 
element into a particular generating set for the Clifford 
group, such as G n := {H,S,CNOT} which consists of 
Hadamard's (H) and phase gates (S) on each qubit, as 
well as CNOT gates on all pairs of qubits. There are 
n 2 + n elements in G n and it is a straightforward process 
to decompose the operations in 1 through 6 above into 
H, S and CNOT gates. 

In total, for an n-qubit system, we can efficiently 
choose Clifford gates uniformly at random and decom- 
pose each gate into a canonical subsequence of elements 
from the generating set G n . The total time complexity 
of these two procedures is O (n 4 ) + O (n 3 ) = O (n 4 ) . 
The number of trials k one needs to perform to estimate 
T g {m, ip) to an accuracy e with probability at least 1 — 5 



0(n 4 ) -R]n(2/S) 



2c 2 



which implies the protocol is scalable in n. 



(6.6) 



is given by Eq. (|6.4p which is independent of m and n. 
Thus if we perform the protocol for R different values of 
m, the total time complexity is 



VII. DISCUSSION 

We have shown that randomized benchmarking pro- 
vides a scalable method for benchmarking the set of 
Clifford gates. The protocol allows for time and gate- 
dependent noise and the fitting models for the fidelity 
function take into account state preparation and mea- 
surement errors. In addition to providing an estimate of 
the average fidelity across all Clifford gates, the first or- 
der model provides a measure of the gate-dependence of 
the noise. 

We have provided here rigorous proofs of both the con- 
ditions for the validity of the protocol, as well as the scal- 
ability of the protocol in the number of qubits n com- 
prising the system. We have also established an exact 
relationship between the average fidelity estimate pro- 
vided by the protocol and a stronger characterization of 
the average error operator strength given by the diamond 
norm for the case of random Pauli errors. The proof of 
this relationship utilizes a semidefinite program for com- 
puting the diamond norm (49j which has the potential to 
establish further connections between these two notions 
of error strength. 

While benchmarking the full unitary group would be 
ideal, this is a provably inefficient task since just gener- 
ating a Haar-random unitary operator is inefficient in n. 
On the other hand as we have shown here benchmarking 
the Clifford group is an efficient task. It is not difficult to 
see that benchmarking the Clifford group provides signif- 
icant information for both fault-tolerant quantum com- 
putation as well as obtaining a benchmark for a gener- 
ating set of the full unitary group. First, any realistic 
implementation of a quantum computer will have to take 
advantage of error-correction codes in order to perform 
fault-tolerant quantum computation. The fact that most 
of the codes used in fault-tolerant theory are stabilizer 
codes implies that the encoding and decoding operations 
that have to be performed can be chosen to be Clifford 
operations. Hence a benchmark of Clifford operations 
provides direct information regarding the robustness of 
these encoding/decoding schemes. 

Second, the unitary group can be generated by adding 
just one single-qubit rotation not in the Clifford group 
(for instance the ^-gate). Hence a benchmark for the 
Clifford group can actually provide useful information re- 
garding a benchmark for a generating set of the full uni- 
tary group. In addition, it has been shown that any uni- 
tary operation can be implemented using Cliffor d g ates, 
a single-qubit ancilla state called a magic state [54| and 
measurements in the computational basis. Hence in this 
model of quantum computation the only gates that need 
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to be bcnchmarkcd for universal quantum computation 
are Clifford gates. 

Various interesting questions and comments arise from 
the benchmarking analysis presented here. First, there 
is a key point to emphasize regarding the zeroth and 
first order fitting models. As depicted in [2f| there exist 
physically relevant noise models for which when the true 
value of the depolarization fidelity parameter p is used, 
the first order model fits the experimental data much 
better than the zeroth order model. However, it may 
be the case that a least squares fitting procedure using 
the functional form of the zeroth order model produces a 
very good fit to the experimental data, albeit producing 
an incorrect value for p. Therefore in order to obtain a 
more accurate value for p one should always use the first 
order fitting model unless prior knowledge of the noise 
indicates that it is effectively gate- independent. 

It will be useful to obtain a better understanding for 
when a least squares fitting procedure using the zeroth 
order model produces a value for p that is close to its true 
value. Clearly in the gate-independent case the zeroth or- 
der model fits the fidelity decay curve exactly. Moreover 
for weakly gate-dependent noise one can see from our 
continuity argument that the zeroth order model is still 
a sufficient fitting function for the fidelity decay curve. 
Hence the most interesting case to analyze is when there 
is a non-negligible amount of gate-dependence in the 
noise and the condition for using the first order model to 
fit the decay curve is satisfied. A useful test that would 
indicate gate-dependence in the noise, and thus the va- 
lidity of the value of p obtained from fitting to the zeroth 
order model, is to perform the least squares fitting proce- 
dure using both the zeroth and first order fitting models. 
If the estimates of p obtained in each case differ signifi- 
cantly then the zeroth order model must be a poor choice 
of fitting function even though it may fit the data well. In 
this case the noise must have a strong gate-dependence 
because otherwise q — p 2 would be small which implies 
the two fitting functions would produce similar estimates 
for p. 

An interesting question is how to extract a meaningful 
average error rate over a generating set of the Clifford 
group, for instance G n defined previously, from the aver- 
age error rate r over the entire Clifford group. One might 
argue that benchmarking a generating set for the Clif- 
ford group is sufficient for benchmarking the full Clifford 
group, however it is entirely plausible that noise correla- 
tions between the n physical qubits creates large errors 
on elements of Clif n , even when the errors on the gener- 
ating set can be controlled (55j . In fact an assumption 



that is often made in fault-tolerant estimates is that the 
correlation in noise between qubits is either small or can 
be ignored. 

With regards to scalability, while we have shown the 
protocol itself is scalable in n, a useful direction for fur- 
ther research would be an analysis of how the sufficient 
condition of weak average variation of the noise depends 
on n. As previously noted, the noise associated to a 
multi-qubit Clifford element is given by the noise associ- 
ated to the sequence of generators comprising the Clif- 
ford. A determination of whether these noise operators 
continue to satisfy the sufficient condition when it is met 
for small numbers of qubits will be useful for understand- 
ing the applicability of the protocol. 

Rigorous fault-tolerant analyses sometimes invoke the 
diamond norm as a measure of the error strength rather 
than the weaker characterization provided by the aver- 
age fidelity Hence it is desirable to find relationships 
between these two quantities that is more general than 
the special case of random Pauli errors presented here. 
As mentioned above, the semidefinite program we have 
used to deduce the relationship appears to be a promising 
tool for further research in this area. From the expression 
given in Eq. (|2.2p one can see that the diamond norm is 
essentially a "worst-case" maximization over input (en- 
tangled) states. In quantum computation it is the case 
that the measure of accessible states (states that can be 
reached in polynomial time using a generating set for the 
unitary group) is equal to 0. Hence there is a high proba- 
bility that the maximization criteria demanded by the di- 
amond norm is a much stronger condition than necessary 
for understanding the strength of the errors affecting the 
computation. This point becomes even more relevant for 
an algorithm-specific (ie. non-universal) quantum com- 
puter. An interesting direction of further research is to 
provide precise conditions for when the average fidelity 
provides an indication or bound on the error strength in 
terms of stronger characterizations such as the diamond 
norm. 

Additionally, if one were able to obtain an estimate of 
the minimum gate fidelity from knowledge of the average 
fidelity they could use the direct relationship between 
the minimum gate fidelity and diamond norm given by 
Eq. (|2.21[) to obtain information about the error strength 
in terms of the diamond norm. A result that may be 
useful in this direction of research is the "concentration 
of measure effect" of the gate fidelity which implies that 
as n increases, the measure of the set of states which 
produce a fidelity close to the minimum yet far from the 
average is exponentially small in n [4l|, |42j ■ 
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